AITopics | layer depth

Grounding Representation Similarity with Statistical Testing

Neural Information Processing SystemsApr-24-2026, 15:49:12 GMT

To understand neural network behavior, recent works quantitatively compare different networks' learned representations using canonical correlation analysis (CCA), centered kernel alignment (CKA), and other dissimilarity measures. Unfortunately, these widely used measures often disagree on fundamental observations, such as whether deep networks differing only in random initialization learn similar representations. These disagreements raise the question: which, if any, of these dissimilarity measures should we believe? We provide a framework to ground this question through a concrete test: measures should have sensitivity to changes that affect functional behavior, and specificity against changes that do not. We quantify this through a variety of functional behaviors including probing accuracy and robustness to distribution shift, and examine changes such as varying random initialization and deleting principal components. We find that current metrics exhibit different weaknesses, note that a classical baseline performs surprisingly well, and highlight settings where all metrics appear to fail, thus providing a challenge set for further improvement.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.57)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Learning Structured Sparsity in Deep Neural Networks

Neural Information Processing SystemsMar-17-2026, 07:55:57 GMT

High demand for computation resources severely hinders deployment of large-scale Deep Neural Networks (DNN) in resource constrained devices. In this work, we propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardware-friendly structured sparsity of DNN to efficiently accelerate the DNN's evaluation. Experimental results show that SSL achieves on average 5.1X and 3.1X speedups of convolutional layer computation of AlexNet against CPU and GPU, respectively, with off-the-shelf libraries. These speedups are about twice speedups of non-structured sparsity; (3) regularize the DNN structure to improve classification accuracy. The results show that for CIFAR-10, regularization on layer depth reduces a 20-layer Deep Residual Network (ResNet) to 18 layers while improves the accuracy from 91.25% to 92.60%, which is still higher than that of original ResNet with 32 layers.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Neural Information Processing SystemsDec-25-2025, 14:25:53 GMT

Neural networks models for NLP are typically implemented without the explicit encoding of language rules and yet they are able to break one performance record after another. This has generated a lot of research interest in interpreting the representations learned by these networks. We propose here a novel interpretation approach that relies on the only processing system we have that does understand language: the human brain. We use brain imaging recordings of subjects reading complex natural text to interpret word and sequence embeddings from 4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, context length, and attention type.

brain, name change, natural-language processing, (7 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.77)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Learning Structured Sparsity in Deep Neural Networks

Neural Information Processing SystemsNov-21-2025, 14:36:34 GMT

High demand for computation resources severely hinders deployment of large-scale Deep Neural Networks (DNN) in resource constrained devices. In this work, we propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardware-friendly structured sparsity of DNN to efficiently accelerate the DNN's evaluation. Experimental results show that SSL achieves on average 5.1X and 3.1X speedups of convolutional layer computation of AlexNet against CPU and GPU, respectively, with off-the-shelf libraries. These speedups are about twice speedups of non-structured sparsity; (3) regularize the DNN structure to improve classification accuracy. The results show that for CIFAR-10, regularization on layer depth reduces a 20-layer Deep Residual Network (ResNet) to 18 layers while improves the accuracy from 91.25% to 92.60%, which is still higher than that of original ResNet with 32 layers.

deep neural network, learning structured sparsity, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

5c8168a8eca2eb23f6b1f5019371043e-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 03:50:30 GMT

attack layer depth, dataset, layer depth, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TrackFormers Part 2: Enhanced Transformer-Based Models for High-Energy Physics Track Reconstruction

Caron, Sascha, Dobreva, Nadezhda, Kimpel, Maarten, Odyurt, Uraz, Pshenov, Slav, Bazan, Roberto Ruiz de Austri, Shalugin, Eugene, Wolffs, Zef, Zhao, Yue

arXiv.org Artificial IntelligenceOct-1-2025

High-Energy Physics experiments are rapidly escalating in generated data volume, a trend that will intensify with the upcoming High-Luminosity LHC upgrade. This surge in data necessitates critical revisions across the data processing pipeline, with particle track reconstruction being a prime candidate for improvement. In our previous work, we introduced "TrackFormers", a collection of Transformer-based one-shot encoder-only models that effectively associate hits with expected tracks. In this study, we extend our earlier efforts by incorporating loss functions that account for inter-hit correlations, conducting detailed investigations into (various) Transformer attention mechanisms, and a study on the reconstruction of higher-level objects. Furthermore we discuss new datasets that allow the training on hit level for a range of physics processes. These developments collectively aim to boost both the accuracy, and potentially the efficiency of our tracking models, offering a robust solution to meet the demands of next-generation high-energy physics experiments.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.26411

Country:

Europe > Netherlands (0.32)
Europe > Italy > Sardinia (0.14)

Genre: Research Report > New Finding (0.35)

Industry: Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

From Parameters to Performance: A Data-Driven Study on LLM Structure and Development

Wang, Suqing, Li, Zuchao, Shi, Luohe, Du, Bo, Zhao, Hai, Li, Yun, Wang, Qianren

arXiv.org Artificial IntelligenceSep-24-2025

Large language models (LLMs) have achieved remarkable success across various domains, driving significant technological advancements and innovations. Despite the rapid growth in model scale and capability, systematic, data-driven research on how structural configurations affect performance remains scarce. To address this gap, we present a large-scale dataset encompassing diverse open-source LLM structures and their performance across multiple benchmarks. Leveraging this dataset, we conduct a systematic, data mining-driven analysis to validate and quantify the relationship between structural configurations and performance. Our study begins with a review of the historical development of LLMs and an exploration of potential future trends. We then analyze how various structural choices impact performance across benchmarks and further corroborate our findings using mechanistic interpretability techniques. By providing data-driven insights into LLM optimization, our work aims to guide the targeted development and application of future models. We will release our dataset at https://huggingface.co/datasets/DX0369/LLM-Structure-Performance-Dataset

arxiv preprint arxiv, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.18136

Country: Asia > China (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

The Forward-Forward Algorithm: Characterizing Training Behavior

Adamson, Reece

arXiv.org Artificial IntelligenceApr-16-2025

The Forward-Forward algorithm is an alternative learning method which consists of two forward passes rather than a forward and backward pass employed by backpropagation. Forward-Forward networks employ layer local loss functions which are optimized based on the layer activation for each forward pass rather than a single global objective function. This work explores the dynamics of model and layer accuracy changes in Forward-Forward networks as training progresses in pursuit of a mechanistic understanding of their internal behavior. Treatments to various system characteristics are applied to investigate changes in layer and overall model accuracy as training progresses, how accuracy is impacted by layer depth, and how strongly individual layer accuracy is correlated with overall model accuracy. The empirical results presented suggest that layers deeper within Forward-Forward networks experience a delay in accuracy improvement relative to shallower layers and that shallower layer accuracy is strongly correlated with overall model accuracy.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.11229

Country: North America (0.46)

Genre: Research Report > New Finding (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Reviews: Brains on Beats

Neural Information Processing SystemsJan-20-2025, 18:43:57 GMT

This is an interesting study, following a line of studies in the visual system trying to link the representation of stimuli in different layers of artificial neural networks to the representation in different stages of biological neural processing. The authors claim (and I do not dispute this claim) That this is the first such study in the auditory system, making this study novel and potentially impactful. My main concern is the dimensionality of the comparisons, and in particular the searchlight approach. The images in Figure 1 are indeed compelling, but they have the potential to be misleading. First, us humans tend to look for patterns in images, so it is important to provide more objective summaries.

quantification, representation, review, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Neural Information Processing SystemsOct-10-2024, 07:38:34 GMT

Neural networks models for NLP are typically implemented without the explicit encoding of language rules and yet they are able to break one performance record after another. This has generated a lot of research interest in interpreting the representations learned by these networks. We propose here a novel interpretation approach that relies on the only processing system we have that does understand language: the human brain. We use brain imaging recordings of subjects reading complex natural text to interpret word and sequence embeddings from 4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, context length, and attention type.

brain, layer depth, natural-language processing, (5 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Filters

Collaborating Authors

layer depth

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Grounding Representation Similarity with Statistical Testing

Learning Structured Sparsity in Deep Neural Networks

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Learning Structured Sparsity in Deep Neural Networks

5c8168a8eca2eb23f6b1f5019371043e-Paper-Conference.pdf

TrackFormers Part 2: Enhanced Transformer-Based Models for High-Energy Physics Track Reconstruction

From Parameters to Performance: A Data-Driven Study on LLM Structure and Development

The Forward-Forward Algorithm: Characterizing Training Behavior

Reviews: Brains on Beats

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)